Biostatistics For Dummies (Monika Wahi John Pezzullo)

Like Figure 18-7, every ROC graph has sensitivity running up the Y axis, which is displayed either as

fractions between 0 and 1 or as percentages between 0 and 100. The X axis is either presented from

left to right as

, or like it is in Figure 18-7, where specificity is labeled backwards —

from right to left — along the X axis.

Most ROC curves lie in the upper-left part of the graph area. The farther away from the

diagonal line they are, the better the predictive model is. For a nearly perfect model, the ROC

curve runs up along the Y axis from the lower-left corner to the upper-left corner, then along the

top of the graph from the upper-left corner to the upper-right corner.

Because of how sensitivity and specificity are calculated, the graph appears as a series of steps. If you

have a large data set, your graph will have more and smaller steps. For clarity, we show the cut values

for predicted probability as a scale along the ROC curve itself in Figure 18-7, but unfortunately, most

statistical software doesn’t do this for you.

Looking at the ROC curve helps you choose a cut value that gives the best tradeoff between

sensitivity and specificity:

To have very few false positives: Choose a higher cut value to give a high specificity. Figure 18-

7 shows that by setting the cut value to 0.6, you can simultaneously achieve about 93 percent

specificity and 87 percent sensitivity.

To have very few false negatives: Choose a lower cut value to give higher sensitivity. Figure 18-

7 shows you that if you set the cut value to 0.3, you can have almost perfect sensitivity because

you’ll be at almost 100 percent, but your specificity will be only about 75 percent, meaning you’ll

have a 25 percent false positive rate.

The software may optionally display the area under the ROC curve (abbreviated AUC), along with its

standard error and a p value. This is another measure of how good the predictive model is. The

diagonal line has an AUC of 0.5, and there is a statistical test comparing your AUC to the diagonal

line. Under α = 0.05, if the p value < 0.05, it indicates that your model is statistically significantly

better than the diagonal line at accurately predicting your outcome.

Heads Up: Knowing What Can Go Wrong with

Logistic Regression

Logistic regression presents many of the same potential pitfalls as ordinary least-squares regression

(see Chapters 16 and 17), as well as several that are specific to logistic regression. Watch out for

some of the more common pitfalls:

Don’t fit a logistic function to non-logistic data: Don’t use logistic regression to fit data that

doesn’t behave like the logistic S curve. Plot your grouped data (as shown earlier in Figure 18-